git version 2.39.5 (Apple Git-154)
Choose the right first language. Python didn’t work for me in the beginning.
Try a lot of EDA(explanatory data analysis).
No need to hurry. You probably don’t know what you don’t know because you don’t have to.
These slides cannot cover everything you can do with Git. We will only cover version controls, collaboration, and GitHub pages. But I did bring a gift in the last slide:)
Git is a distributed version control system that tracks versions of files. It is often used to control source code by programmers who are developing software collaboratively.
Git creates a detailed history of your directory(folder). GitHub is where you post it, or where you download resources from. You can share your work or copy others’ work.
Git Bash in Windows toolbar.Cmd + spacebar.If Git is properly installed, writing git --version in the terminal(shell) would display the following:
user.name and user.email to the ones you have in your GitHub account.Single(-) and double dash(--) stand for options.
pwd, set it to Desktop if it isn’t, and create a practice directory.Usually a project directory would have multiple subdirectories including codes, doc, data, figures, and temp.
Tip) cd .. leads to the outer folder containing current working directory.
git init, and check it with Cmd+Shift+.Tipbox
Directory vs Repository
These two are near identical in normal use. Locally, it is the folder of interest in a computer. However, repositories are directories with Git activated. Thus, folders in GitHub is mostly called a repo than a directory as it always entails Git.
REAEME fileMost projects include README files to explain the content of the project and what’s in the repo.
Write “This is a practice repo of Git”, and close it with Ctrl(Cmd) + X and enter.
Tipbox
Terminal vs Shell
You might have come across these two words. Put simply, shell is a program that covers the core of the computer. In other words, it is a program, a translator between you and the computer, that touches the core functions of a computer. Shell could be incarnated in a Command Line Interface(CLI) or Graphical User Interface(GUI). But usually, when we say shell, it refers to CLIs. Terminal is the interface program where you can interact with the shell. You can consider it a imitated CLI on a GUI - or more accurately, a TUI. - An eptiome og GUI is the Windows File Explorer.
html output with Jupyter NotebookLet’s create a html output with either Python(.ipynb) or R(.rmd). Save it as output.html in the doc subdirectory.
# Practice with the `penguins` dataset
## Cell one : loading data
import seaborn as sns
import pandas as pd
df = sns.load_dataset('penguins')
df.head(3) species island bill_length_mm ... flipper_length_mm body_mass_g sex
0 Adelie Torgersen 39.1 ... 181.0 3750.0 Male
1 Adelie Torgersen 39.5 ... 186.0 3800.0 Female
2 Adelie Torgersen 40.3 ... 195.0 3250.0 Female
[3 rows x 7 columns]
## Cell two : processing data
df = df.dropna(subset = ['sex', 'body_mass_g'], how = 'any', axis = 0)
### Cell three : plotting figure
import matplotlib.pyplot as plt
fig = plt.figure(figsize = (9,5))
ax1 = fig.add_subplot(1,2,1)
ax2 = fig.add_subplot(1,2,2)
df.plot(kind='scatter', x='flipper_length_mm', y='body_mass_g', c='coral', ax=ax1)
sns.regplot(x='flipper_length_mm', y='body_mass_g', data=df, scatter_kws={"alpha": 0.5}, ax=ax2)
plt.show()# Cell one: Load required libraries
library(tidyverse)
library(palmerpenguins)
library(patchwork)
# Cell two: Load dataset
df <- penguins
head(df, 3)# A tibble: 3 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
# ℹ 2 more variables: sex <fct>, year <int>
# Cell three: Process data
df <- df %>% filter(!is.na(sex), !is.na(body_mass_g))
# Cell four: Plotting figure
p1 <- ggplot(df, aes(x = flipper_length_mm, y = body_mass_g)) +
geom_point(color = "coral") +
labs(title = "Scatter Plot")
p2 <- ggplot(df, aes(x = flipper_length_mm, y = body_mass_g)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm") +
labs(title = "Regression Plot")
p1 + p2Image from Dev Community
Then, all the files are now recorded in part of your commitment history. Check git log.
In git log, hash is the ID specific to a particular commit.
Make any slight modification to the README.txt file, and let’s check if git spots the difference.
Tip) Try to leave the staging area clean.
git log can include multiple options.
git log -3 # restricts the number of commits to three (most recent)
git log -3 README.txt # displays log of only README.txt
git log --since='Month Day Year' # displays only the commits made after the specified date
git log --since='M D Y' --until='M D Y' # displays only the commits made after and until the specified dategit reset HEAD or git reset HEAD file_nameWhy call HEAD?
You might wonder why we have to call HEAD when HEAD refers to the last commit. This is because unstaging a file actually means matching the staging area to the last commit. Technically, unstaging is not emptying the staging area, but matching the staging area to the last commit.
git checkout . or git checkout -- file_namegit revert HEAD or git revert hashWhat is Reverting?
Sometimes, we commit files that contains an error, and then spot the issue. Naturally, we need a command of restoring a repo to the state before the previous commit and make a new commit with the previous version. This is what git revert does. Also, the git revert command opens a text editor in the shell to add a commit message. However, it is also a command that often generates conflict. I do not recommend git revert unless reverting is inevitable. If the command sounds less intuitive, ignore it. Totally fine.
Let’s push the repo and the html file we made to our GitHub account.
New in the top-left banner.Repository Name* with the local repo name.Public and don’t add a README file.Create Repository.< > Code in the left-top banner.< > Code button in the right-top.HTTPS URL, which would say [https://github.com/username/practice.git].git remote add origin https://github.com/username/practice.git.git push -u origin main. This means that we’re pushing our local repo to the origin repo’s main branch. Git might require you to insert ID and password. For the password, please insert PAT.Pushing your local repo to a public repo means you share the items and history of your local repo. This requires a few cautions.
Without inevitable reasons, don’t include data to the git push, especially raw data. This is not only because it takes a lot of space, but most raw data are either publicly available or classified. You can lead the readers to the data website if it’s public in the README.md. If the data is classified, you have a responsibility not to disclose it. To avoid such issues, you can include the raw data files into the .gitignore file and only commit and push selected files of a repo.
Check if your codes include personal information. This could often happen if your code includes personal API keys. If you upload them, you’ll get an email from GitHub that your information is exposed in most cases. However, avoiding is much easier than solving problems.
Settings page of the repo in GitHub.Pages. Click it.Build and Deployment setting, set the source to Deploy from a branch and Branch to main and /(root).https://username.github.io/practice/doc/output.htmlLet’s practice cloning with with package repo of Callaway and Sant’anna(2021).
Also add the cloned repo a remote name, so that pull/pushing to the original repo(in GitHub) can be modified. (It is impossible to push to the original repo without the author’s permission, so just take it as an example.)
Having a cloned version of the did package, you can look into the repo in your local computer. However, the author might make changes to package. You can pull the changes via git pull.
git fetch remote_name: fetches a Git remote with the specified name into the current local repogit fetch remote_name branch_name: fetches a Git remote’s branch with the specified remote and branch namegit merge remote_name branch_name: merges the local repo to the remote’s branch with the specified remote and branch namegit pull remote_name branch_name: fetches and merges the remote to the localWhy not clone again?
Cloning is copying the entire repo. Pulling is fetching and merging a ‘branch’ out of a remote repo. May be now is the right time to discuss branches.
git branch #displays which branches exist in the repo(* denotes the current branch)
git branch branch_name #creates a branch
git branch -m old_name new_name #renames a branch
git switch branch_name #moves to the branch
git diff branch1 branch2 #displays diffs between branches
git merge source_b destination_b #merges one branch to the otherBranches enable you to isolate changes for specific features, bug fixes, or experiments without affecting the main codebase(usually the main or master branch). By creating branches, teams can work on multiple tasks concurrently, merging them back into the main branch once they are complete and tested.
One can think of the main branch as the ground truth. Each branch exists for a specific task, and once the task is complete and the process is confirmed to be ground true, it is then merged to the main branch.
Pro Git by Ben Straub and Scott Chacon
To check your Personal Access Token (PAT) in GitHub, follow these steps:
If you cannot find or remember your PAT:
Verify PAT is Working.
To check if a PAT is valid:
If valid, this returns details about your GitHub user.
Tips:
Store PAT Securely: Use credential managers, environment variables, or secret management tools to keep it safe.
Limit Scopes: Create tokens with the minimal necessary permissions to reduce risks.
Use Fine-Grained Tokens: These provide more granular permissions compared to classic tokens.
Applied Economic Methods Study Group